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Abstract 

Background: Thymidylate synthase is a housekeeping gene, designated ancient due to its role in DNA synthesis 
and ubiquitous phyletic distribution. The genomic sequences were characterized coding for thymidylate synthase in 
two species of the genus Trichinella, an encapsulating T. spiralis and a non-encapsulating T. pseudospiralis. 

Methods: Based on the sequence of parasitic nematode Trichinella spiralis thymidylate synthase cDNA, PCR 
techniques were employed. 

Results: Each of the respective gene structures encompassed 6 exons and 5 introns located in conserved sites. 
Comparison with the corresponding gene structures of other eukaryotic species revealed lack of common introns 
that would be shared among selected fungi, nematodes, mammals and plants. The two deduced amino acid 
sequences were 96% identical. In addition to the thymidylate synthase gene, the intron-less retrocopy, i.e. a processed 
pseudogene, with sequence identical to the 7". spiralis gene coding region, was found to be present within the 
7". pseudospiralis genome. This pseudogene, instead of the gene, was confirmed by RT-PCR to be expressed in the 
parasite muscle larvae. 

Conclusions: Intron load, as well as distribution of exon and intron phases in thymidylate synthase genes from 
various sources, point against the theory of gene assembly by the primordial exon shuffling and support the 
theory of evolutionary late intron insertion into spliceosomal genes. Thymidylate synthase pseudogene expressed 
in 7". pseudospiralis muscle larvae is designated a retrogene. 

Keywords: Trichinella spiralis, Trichinella pseudospiralis, Thymidylate synthase, Gene structure, Introns-late theory, 
Retrogene 



Background 

Trichinella spiralis and Trichinella pseudospiralis are 
two parasitic nematode species colonizing mammalian 
striated muscles. Transmission of Trichinella spp. to the 
next host occurs by ingestion of meat contaminated with 
the parasite muscle larvae. In the intestine, after mating, 
adult female worms give birth to the newborn larvae 
which migrate to the muscles to become the muscle lar- 
vae [1], T. spiralis is an encapsulating species whose 
muscle larvae reside in discrete structures called nurse 
cells, separated from myofibers by collagen capsules 
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[2,3]. Instead, muscle larvae of T, pseudospiralis, migrate 
freely throughout the muscle tissue [1]. Our previous 
studies documented high thymidylate synthase activity 
persisting in developmentally arrested muscle larvae of 
both species, as well as in T. spiralis adult forms [4,5]. 
Thymidylate synthase (EC 2.1.1.45), catalyzing deoxyuri- 
dylate methylation to yield a DNA precursor, thymidy- 
late, is the only cellular source of de novo synthesis of the 
latter [6]. High enzyme activity is known to accompany 
proliferation, as well as to persist in certain growth- 
arrested systems where it is not associated with cell div- 
ision cf. [5]. Consequently, while the enzyme expressed in 
embryos developing in the T. spiralis adult uterus may be 
considered a marker of proliferation, its localization to 
excretory-secretory organ, i.e. stichosome, of T. spiralis 
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adult forms, as well as to gonads and stichosome primor- 
dial cells of non-growing muscle larvae, appears to reflect 
the state of cell cycle arrest [7]. Of note is that high thymi- 
dylate synthase activity found in T. spiralis and T. pseu- 
dospimlis muscle larvae does not appear to vary in 
connection with the difference in the intracellular niche 
occupied by two species. 

Thymidylate synthase belongs to the proteins most 
highly conserved in evolution [6]. It is encoded by a 
housekeeping gene, considered ancient by virtue of the 
enzyme's role in DNA synthesis and its ubiquitous 
phylogenetic distribution [8]. This implies that the gene 
was likely to play a role in the transition from an RNA 
to a DNA world and to exist in the common ancestor of 
modern organisms before kingdoms diverged [9]. Ana- 
lyses of evolutionary pathways of thymidylate synthase 
and other ancient genes may serve delineation of the ori- 
gin of modern DNA sequences and the molecular basis 
of their evolutionary dynamics [10]. Those genes are also 
used as models for studying spliceosomal intron distri- 
bution among phylogenetic lineages, as the issue of 
intron loss or gain during evolution of eukaryotic protein- 
coding genes remains unsettled [11]. Since the discovery 
of introns, two concepts of their origin have been consid- 
ered: (i) the introns-early theory or exon theory of genes 
assumes gene assembly by primordial exon shuffling in a 
common ancestor of bacteria and eukaryotes, and subse- 
quent massive intron loss over evolution, and (ii) the 
introns-late theory or insertional theory assumes consider- 
able intron gain during recent times [12]. Of note is that 
the two theories are also viewed as unlinked rather than 
incompatible, since introns present in the primordial 
genes might have been removed and new ones rapidly 
inserted in different positions [13]. It is also commonly 
agreed that the last eukaryotic common ancestor had a 
high intron density [14,15]. 

T. spiralis thymidylate synthase cDNA has been 
cloned, allowing determination of the nucleotide and de- 
duced amino acid sequences [16] that turned out to be 
identical with the gene coding sequence inferred from 
the recently published draft version of the T. spiralis 
genome [17]. Phylogenetic analysis of amino acid se- 
quences corresponding to the enzymes of different spe- 
cific origin showed T. spiralis thymidylate synthase to 
branch off before other nematode species and display 
higher overall similarity with mammalian than other 
nematode enzymes [16]. Early divergence of the genus 
Trichinella in the evolution of the phylum Nematoda, 
as well as differences in the genome characteristics of 
Trichinella in relation to other nematode lineages were 
also documented by others [18,19]. 

The goal of the present study was to characterize the 
T. pseudospiralis thymidylate synthase cDNA and gen- 
omic sequences, in order to perform analysis of introns 



distribution and compare it with that of the correspond- 
ing T. spiralis gene determined previously [20]. Consid- 
ering previously suggested evolutionary distinctness of 
the genus Trichinella in the evolution of the phylum 
Nematoda (see above), of interest was also to compare 
both Trichinella thymidylate synthase genes with the 
corresponding genes of various eukaryotic species. 

Methods 

Primer sequences 

IVTF: 5' GGGTCTAGACTTGAATGTTATAGATATT 
TATACAATG 3', IVTR: 5' GGGAAGCTTCCATGAAA 
TTTTATTTC 3', RGENP: 5' GAGAGCGGCCGCCAA 
TGACAGAAACTGTTCACAAATTAG 3', RGENK: 5' 
AAAGCGGCCGCGATCACACAGCCATAGGCATTG 3 ', 
RGEN5 1: 5' AAAAGCGGCCGCACGTAATCATCCTGA 
GATG 3', RGEN5'2: 5' AAAAGCGGCCGCAGCTTTCA 
GAGAAGAATGTC 3', RGEN3 1: 5' GAGAGCGGCCGC 
CTATGGCTGTGTGATCAATTG 3', RGEN3 2: 5' GAG 
AGCGGCCGCCCAAATCACCTTCTTCATAATTG, SYN 
EX1: 5' GGGGATCCATATGACAGAAACTGTTCACAA 
ATTAG 3', SYNEX2: 5' AAAAGCTTACACAGCCATAG 
GCATTGATA 3'. 

Biological material 

T. spiralis (ISS 569 155 569) and T. pseudospiralis 
muscle larvae were maintained and isolated, as previ- 
ously described [5]. 

Nucleic acids isolation 

Total RNA was isolated from T. pseudospiralis muscle 
larvae using TRIZOL reagent (Life Technologies) and 
genomic DNA from muscle larvae of both species was 
extracted using Wizard Genomic DNA Purification Kit 
(Promega), with the manufacturer's protocols applied in 
each case. 

Reverse transcription 

This was performed on total RNA prepared from T. 
pseudospiralis muscle larvae with SYNEX2 primer, in 
two rounds of 1 h incubation at 42°C [16], using MMLV 
reverse transcriptase (Promega). 

Polymerase Chain Reaction (PCR) 

In order to amplify T. pseudospiralis thymidylate synthase 
ds cDNA, as well as T. spiralis and T. pseudospiralis thy- 
midylate synthase genes, standard PCR on ss cDNA or 
genomic DNA was performed, using Taq DNA polymer- 
ase (Promega), as recommended by the enzyme manufac- 
turer. The following primer combinations and cycling 
conditions were applied: (i) in the case of T. pseudospiralis 
thymidylate synthase cDNA, SYNEX1 and SYNEX2 
primers were used, with initial 3 min at 95°C and the hot 
start steps, followed by 35 cycles of 30 sec at 95°C, 30 sec 



Jagielska et al. Parasites & Vectors 2014, 7:175 
http://www.parasitesandvectors.eom/content/7/1/175 



Page 3 of 10 



at 57°C and 1 min at 72°C, (ii) in the case of T. spiralis 
gene, RGENP and RGENK primers were used, with initial 
2 min at 95°C and the hot start steps, followed by 3 cycles 
of 15 sec at 95°C, 15 sec at 60°C and 2 min at 72°C, and 
subsequent 29 cycles of 15 sec at 95°C, 1 min at 68°C and 
2 min at 72°C, (iii) in the case of T. pseudospiralis gene, 
IVTF and IVTR primers were used, with cycling condi- 
tions as in (ii), but for the annealing step being performed 
at 40°C in the initial 3 cycles and at 58°C in the subse- 
quent 29 cycles. For each amplification a negative control 
was included. Amplification products were analyzed by 
electrophoresis in Tris-acetate-EDTA buffered 0.8% agar- 
ose gels. 

Inverse PCR 

In order to determine T. spiralis thymidylate synthase 
gene 5' and 3' flanking regions, inverse PCR was used 
[21]. Genomic DNA extracted from T. spiralis muscle 
larvae was digested for 1 h at 37°C with Eco RI (Life 
Technologies), then ligated overnight at 16°C using 
phage T4 DNA ligase (Promega). A 1/100 dilution of the 
ligation mixture served as a template for PCR with Taq 
DNA polymerase (Promega), applied as recommended 
by the enzyme manufacturer. In the case of the 5 ' region 
amplification, RGEN5'1/RGEN5'2 primers and the fol- 
lowing cycling steps were applied: (i) 2 min at 95°C, 
followed by the hot start step, (ii) 3 cycles of 15 sec at 
95°C, 15 sec at 50°C and 2 min at 72°C, (iii) 29 cycles of 
15 sec at 95°C, 1 min at 68°C and 2 min at 72°C. In the 
case of the 3' region amplification, RGEN3'1/RGEN3'2 
primers were used and cycling conditions as described 
above, but for the annealing step during the initial 3 cycles 
performed at 60.5°C. For each amplification a negative 
control was included. Amplification products were ana- 
lyzed by electrophoresis in Tris-acetate-EDTA buffered 
0.8% agarose gels. 

Cloning and sequencing 

cDNA corresponding to T. pseudospiralis thymidylate 
synthase coding sequence was cloned into Bam HI and 
Hind III sites of pBluescript SK (+) phagemid (Agilent 
Technologies) propagated in E. coli DH5ocF' strain. DNA 
amplicons corresponding to T. spiralis thymidylate syn- 
thase gene and gene flanking regions were cloned into 
Not I site of the same vector. DNA corresponding to T. 
pseudospiralis thymidylate synthase gene was cloned 
into Xba I and Hind III sites also into pBluescript SK (+) 
vector. Sequencing of cloned T. spiralis genomic DNA 
fragments was performed using Sequenase Version 2.0 
DNA Sequencing kit (Affymetrix) and sequencing of 
T. pseudospiralis thymidylate synthase cDNA and gen- 
omic DNA was done by DNA Sequencing and Oligonucleo- 
tide Synthesis core facility at the Institute of Biochemistry 
and Biophysics (Warsaw, Poland). 



GenBank submissions 

T. spiralis and T. pseudospiralis thymidylate synthase 
gene sequences were deposited in the GenBank and are 
available under accessions [GenBank:AF406808] and 
[GenBank:KF186231], respectively. 

Protein secondary structure analysis 

Crystal structures of thymidylate synthase protein mono- 
mers complexed with enzyme substrate deoxyuridylate, 
dUMP, were obtained from Protein Data Bank (www. 
rcsb.org), under the accession codes [PDB:4G9U] for 
T. spiralis, [PDB:4E50] for M. musculus and [PDB:3HB8] 
for H. sapiens R163K mutated enzyme. Secondary structure 
element assignment for subunit A of each model, was 
determined by PDB managing system. Graphical struc- 
ture representation was obtained using VMD 1.8.7 soft- 
ware, implemented with "vmd_use_pdb_ss" script enabling 
reading of secondary structure elements from PDB files at 
http://www.ks.uiuc.edu/Research/vmd/. 

Ethical approval 

Ethical approval was granted by the First Warsaw Local 
Ethics Committee for Animal Experimentation at the 
Nencki Institute. 

Results 

T. spiralis and T. pseudospiralis thymidylate synthase 
genes share the same structure and show 1 1 
substitutions at the deduced amino acid sequence level 

T. spiralis and T. pseudospiralis thymidylate synthase 
genes consist of 6 exons, intervened by 5 introns, 
marked by GT/AG donor/acceptor splice sites (Figure 1). 
All exons are of the same length and 54 single nucleo- 
tide substitutions in 52 codons are found between two 
species within the 924 nt-long open reading frame 
(Additional file 1: Figure SI). Out of those, 37 substitu- 
tions are identified as transitions, i.e. changes between 
two purines or two pyrimidines, and 17 are identified 
as transversions, i.e. changes of purine into pyrimidine 
and vice versa. Transitions, being twice as frequent as 
transversions, reminded of the pattern of nucleotide 
substitutions inferred for human genome based on its 
pseudogene sequences analysis [22]. Among 54 nucleo- 
tide substitutions 42 are silent and 12 result in changes 
of deduced protein sequence. T. spiralis and T. pseu- 
dospiralis thymidylate synthase amino acid sequences 
show 96% identity (Additional file 1: Figure S2). The 
entire 2794 nt-long T. spiralis thymidylate synthase 
gene sequence [GenBank:AF406808] shows 99% overall 
identity (BLAST comparison), with the sequence of the 
corresponding region of T. spiralis genome draft [Gen- 
Bank:NW_003526941]. In both versions the sequences 
of exons, introns and gene 3 ' flanking regions are identi- 
cal. Several single nucleotide differences, i.e. 10 insertions, 
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Figure 1 Structures of genes of T. spiralis and T. pseudospiralis thymidylate synthases. Exons and introns are shown as boxes and lines, 
respectively. The lengths of T. spiralis exons, as well as gene flanking and intronic regions, are given above and below the scheme, respectively. T. pseudospiralis 
intron lengths when different from T, spiralis, are given in parentheses. The alignment of two sequences is shown as Additional file 1 : Figure S1 . 



1 substitution and 2 deletions, were found only within 
gene 5' flanking regions. Nine of those differences ap- 
pear at the sites of single nucleotide stretches, thus 
resulting presumably from sequence reading obstacles. 
Within 340 nt of the gene 5 ' flanking region consensus 
TATA box was identified (Additional file 1: Figure SI), 
implying transcriptional regulation of the parasite gene 
to be different from mammalian, whose promoters are 
TATA-less [23]. 

Two conserved introns and one homologous are present 
in Trichinella and mammalian thymidylate synthase genes 

Intron distribution within T. spiralis/T. pseudospiralis 
thymidylate synthase genes was compared with gene 
structures of other species (Figure 2). The genes of the 
free-living nematode C. elegans and fungus, F. neofor- 
mans, the latter being an airways pathogen of immuno- 
compromised patients, contain low intron load, i.e. 2, 
vs. 4 present in other airways pathogenic fungi species, 
P. carinii, and 5 present in the Trichinella gene. The 
genes of two other parasitic nematodes, B. malayi and 
L. loa, contain 6 introns, as is also the case for mamma- 
lian genes, including human (only B. taurus carries one 
additional intron). Plant genes appear the richest in in- 
trons, with D. carota carrying 8 introns within the thy- 
midylate synthase part of its bifunctional dihydrofolate 
reductase/thymidylate synthase gene. Conserved introns, 
i.e. present in exactly the same positions, or homologous 
introns, i.e. those whose positions are shifted (slid) by sev- 
eral nucleotides, are found among animal and fungal 
genes. Among nematode thymidylate synthase genes four 
conserved introns are present in Trichinella, B. malayi 
and L. loa genes. Out of those four, two introns corre- 
sponding to Trichinella introns 2 and 4, are also found in 
C. elegans gene, as its only intron burden. Within mam- 
malian thymidylate synthase genes all introns are con- 
served, with the exception of B. taurus intron 7 which 
remains shifted by 3 nt in relation to the terminus of the 
corresponding intron 6 present in other mammalian 
species. The genes of nematodes, with the exception of 
C. elegans, and those of fungi and mammals contain one 
conserved intron, corresponding to Trichinella intron 3. 
This most highly conserved intron is located 11 nt up- 
stream from the junction site of the exon carrying the 
region coding the enzyme active center. Both plant, 



A. thaliana and D. carota genes contain an intron located 
12 nt downstream, and P. carinii carries an additional in- 
tron located 8 nt downstream from that region (intron po- 
sitions marked on the aligned amino acid sequences are 
provided as Additional file 1: Figure S3). Thus P. carinii 
enzyme active center is encoded by a separate, 40 nt-long 
exon. The distance between animal/fungal and plant in- 
trons situated on the opposite sites of this region, is 44 nt. 
Apart from Trichinella intron 3, animal (with the excep- 
tion of C. elegans) thymidylate synthase genes contain an 
additional conserved intron, corresponding to Trichinella 
intron 5. Additionally, a homologous intron, correspond- 
ing to Trichinella intron 2, is present in animal genes, 
with 3 nt-shift in mammalian vs. nematode genes. Besides, 
a conserved intron, absent from Trichinella and C. elegans 
genes, is present in B. malayi, L. loa and mammalian 
genes as intron 5. In regard to other intron position hom- 
ology, F. neoformans and Trichinella genes carry intron 1 
shifted by 5 nt between two genes. Introns in plant A. 
thaliana and D. carota thymidylate synthase, parts of the 
bifunctional genes, are found in the conserved positions 
with the exception of an additional D. carota intron 7. Yet 
no common intron is found for plant vs. animal and fun- 
gal genes. Intron positions may breake coding sequence 
between codons (intron phase 0), or after the first or the 
second base pair (intron phase 1 or 2, respectively) [24]. 
In fungal and animal thymidylate synthase genes, with the 
exception of those of B. malayi, L. loa and B. taurus, the 
intron phases other than 0 are dominating or equally 
frequent. In B. malayilL. loa, B. taurus, A. thaliana 
and D. carota genes, phase 0 introns prevail, reaching 
the numbers 4 out of 6, 4 out of 7, 5 out of 7 and 5 out 
of 8, respectively. In analogy to the intron phases, also 
exon phases are defined. The exons beginning and end- 
ing with the same phase are called symmetric, in con- 
trast to asymmetric ones, which begin and end with a 
different phase [24]. In Trichinella and A. thaliana thymi- 
dylate synthase genes symmetric exons prevail, reaching 
the numbers 4 out of 6 and 5 out of 8, respectively. In P. 
carinii, gene asymmetric exons significantly prevail (4 out 
of 5). In B. malayi, L. loa and mammalian genes, asymmet- 
ric exons are more frequent (4 out of 7). This is also the 
case for C. elegans (2 out of 3), F. neoformans (2 out of 3) 
and D. carota (5 out of 9), genes. In B. taurus the frequency 
of both types of exons is equal. 
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Figure 2 Comparison of gene structures of various thymidilate synthases. Exons are boxed and introns are shown as lines. The introns are 
annotated below the gene schemes with the number of codons they break within particular genes and with their phase, being 0 when falling 
between codons, and 1 or 2 when interrupting codons. Introns located in conserved or homologous positions, are marked with the same line 
pattern and the intron number within particular species gene. The positions of slide introns (S-nt slide in the case of Trichinella intron 1 vs. F. 
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and their phases given in the brackets. Number key within particular exon box indicates location of the enzyme active center. Dihydrofolate 
reductase (DHFR) and thymidylate synthase (TS) parts of bifunctional plant genes are marked below the gene schemes. Gene sequences that 
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with marked intron positions is shown as Additional file 1: Figure S3. 
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Lack of correlation among location of conserved and 
homologous introns and the protein secondary structure 
entities 

Location within amino acid sequence of splicing sites of 
three introns, two conserved and one homologous 
among T. spiralis, M. musculus and H. sapiens thymidy- 
late synthase genes, and assignment of those sites to the 
protein spatial structure in enzyme models obtained 
from PDB, were followed (Figure 3). T. spiralis intron 2 
and homologous mammalian intron 1, both being phase 
1 introns (Figure 2), do not separate secondary structure 
elements. The former lays within a helix and the latter is 
located within a region assigned in PDB files neither 
helical nor (3-sheet form. While Trichinella intron 3 is 
located also within a helix, conserved with it mammalian 
intron 4 is located at the edge of a helix, despite both in- 
trons being of phase 1. Trichinella intron 5 and con- 
served mammalian intron 6, both being phase 0 introns, 
are located at the edge of helices. No correlation can 
thus be inferred among intron locations, their phases, 
and protein secondary structure element borders. 

Thymidylate synthase retrogene is expressed instead of 
the gene in T. pseudospiralis muscle larvae 

PCR on T. pseudospiralis genomic DNA resulted in two 
products, a 1220 nt-long amplicon, corresponding to the 
gene region encompassing exon and intron sequences, 
and a 924 nt-long amplicon of the sequence identical to 
T. spiralis thymidylate synthase ORF, designated a 
pseudogene (Figure 4). RT-PCR on RNA isolated from 



T. pseudospiralis muscle larvae used as a starting mater- 
ial resulted in a single product (not shown), and sequen- 
cing of several bacterial clones revealed its sequence to 
be identical with that of T. spiralis thymidylate synthase 
cDNA. In view of the latter, a processed pseudogene is 
apparently expressed in T. pseudospiralis muscle larvae. 
Ultimately, its genomic sequence is referred to as retro- 
gene. Although none of the clones sequenced carried a 
sequence that would correspond to T. pseudospiralis 
thymidylate synthase gene, transcription of the gene can- 
not be unequivocally excluded. 

Discussion 

In the present study T. spiralis and T. pseudospiralis thy- 
midylate synthase genes were found to share exon- 
intron structure. Comparison with the gene structures of 
other eukaryotes, including animal, fungal and plant spe- 
cies, revealed lack of a common intron. However, a con- 
served intron is found in the vicinity of the region 
encoding the enzyme active center in nematode (with 
the exception of C. elegans), fungal and mammalian 
genes. In T. pseudospiralis genomic DNA, apart from 
the gene, thymidylate synthase retrogene was identified, 
of the sequence identical to T. spiralis thymidylate syn- 
thase ORF. This retrogene was found to be expressed, 
instead of the gene, in the parasite muscle larvae. 

Comparison of intron distribution among Trichinella 
and other eukaryotic thymidylate synthase genes shows 
lack of a conserved or even slid intron that would be 
shared by the organisms included in the analysis, 




Figure 3 Superimposition of T. spiralis (magenta), mouse (cyan) and human (grey) thymidylate synthase structure models of subunits 
A of dUMP complexes, obtained through the accessions [PDB:4G9U], [PDB:4E50], [PDB:3HB8], respectively. Secondary structure elements 
are distinct with helices shown as ribbons, (3-sheets as wide parallel arrows and unclassified regions shown as lines. Derivative sites of homologous 
intron located at Asp-63 and two conserved introns located at Asp-180 and Gln-262/Met-263 (each amino acid and its number is given for T. spiralis 
sequence), are marked in yellow with corresponding structural elements shown aside in enlargement. 
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Figure 4 Electrophoretograms of PCR products resulting from 
amplification of T. spiralis and T. pseudospiralis genomic DNA, 
performed with primers specific for the ends of T. spiralis 
thymidylate synthase ORF. The accurate lengths, based on 
sequencing, are given for T. spiralis and T. pseudospiralis (in parentheses) 
genes, and for T. pseudospiralis retrogene. 



representing animal, plant and fungal crown groups. The 
intron sliding, termed also intron slippage or drifting, in- 
dicates either preexisting intron relocation to a nearby 
position or its replacement by a new intron at a nearby 
position. This phenomenon is proposed to be associated 
with alternative splicing, proceeding by reverse splicing 
mechanism, i.e. insertion of a spliced-out intron into 
gene transcript, followed by reverse transcription and 
homologous recombination. Intron slippage is consid- 
ered to reflect intron gain events rather than indicate lo- 
cation of an ancient intron, i.e. being remnant from the 
eukaryotic common ancestor [11,15,25]. Among the spe- 
cies studied, the most highly conserved is the intron cor- 
responding to Trichinella intron 3, present also in other 
nematodes (with the exception of C. elegans), fungal and 
mammalian genes. As absent from plant genes, it cannot 
be considered ancient but rather constituting a relic 
from the common ancestor of fungi and animal evolu- 
tionary lineages [26]. Commonality of intron load within 
animal lineage only is further evident, based on the pres- 
ence in all nematodes and mammals of an intron hom- 
ologous to Trichinella intron 2 and conservation of two 
other introns, one corresponding to Trichinella intron 5 
and present in all other species, and the other, absent 
from Trichinella and C. elegans genes but present in 
B. malayi, L. loa and mammalian genes as intron 5. An 



overall lower intron number in C. elegans than in other 
nematode thymidylate synthase genes (2 vs. 5/6), re- 
mains in agreement with a notion that C. elegans phyl- 
ogeny, unlike with other nematodes, is characterized by 
extensive intron loss and restricted intron gain [14,27]. 
Thus excluding C. elegans, which does not seem to be a 
representative model for spliceosomal intron studies, it 
can be inferred that thymidylate synthase genes of all 
animal species included in the analysis represent a simi- 
lar intron location pattern. Also plant thymidylate syn- 
thase genes show apparently kingdom-specific intron 
distribution. Therefore, it can be hypothesized that loca- 
tions of introns in thymidylate synthase genes represent 
new insertional events, occurring independently in ani- 
mal and plant kingdoms, with fungi sharing possibly a 
common thymidylate synthase gene origin with animals. 
The aforementioned conservation of Trichinella intron 3 
in nematodes (with the exception of C. elegans), fungal 
and mammalian genes may be associated with the down- 
stream proximity of the enzyme active center-coding re- 
gion. Also in plant A. thaliana and D. carota genes, 
there is an intron located proximally, but downstream 
from that region highly conserved among various species 
[6]. The distance (44 nt) between locations of fungal/ 
animal and plant introns, on the opposite sites of the ac- 
tive center coding-sequence, is too long to result from 
intron sliding (up to 15 nt are allowed for a slide), and 
rather too short for coding a functional peptide, consid- 
ering an assumed minimum evolutionary exon length of 
45 base pairs [26]. Therefore, it is not likely that both 
introns were present in the last common eukaryotic 
ancestor. Their locations seem to point rather to late in- 
sertional events, occurring in the vicinity of 21 base 
pairs-long region coding the enzyme active center, which 
remained uninterrupted due to selection pressure. Add- 
itionally, only in Trichinella and plant genes the active 
center-containing exon is symmetric, unlike in other 
nematode, fungal and mammalian genes, where it is 
asymmetric. Exon symmetry is claimed to be required 
for maintenance of the reading frame in the case of exon 
shuffling [24]. Thus in the majority of thymidylate syn- 
thase genes analyzed, the active center-coding exon does 
not conform to this criterion. According to the exon 
shuffling theory, not only symmetric exons but also 
phase 0 introns are believed to be favored [24]. Such intron 
positions are not predominant in Trichinella, C. elegans, 
fungal and mammalian genes, except in that of B. Taurus, 
but in B. malayi, L. leo, B. taurus and plant genes phase 0 
introns prevail. However, a high number of phase 0 introns 
occurs in intron-rich regions and appears correlated with 
higher overall intron load. Also, in animal and plant thymi- 
dylate synthase genes, short exons, associated with phase 0 
intron-rich regions, tend to be symmetric. Thus, based 
on the distribution of intron and exon phases within 
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thymidylate synthase genes, and in light of the lack of an 
ancient intron, the rules apparently governing the evolu- 
tionary exon shuffling seem to apply to introns late inser- 
tional events. A positive correlation between high intron 
density and species developmental complexity has been 
established [13] and an overall high intron complement, 
evident for mammalian and plant thymidylate synthase 
genes, conforms to that rule. However, the intron load, 
similarly high in nematodes, with the exception of C. ele- 
gans, and mammalian genes, casts doubt on this statement. 
Interestingly, while the two Trichinella species display thy- 
midylate synthase gene structure similar to other nema- 
todes (with the exception of C. elegans), the enzyme 
sequence similarity analysis implicates early evolutionary 
divergence of the genus Trichinella from the phylum 
Nematoda. This observation indicates that sequence and 
gene structure evolution may not be closely linked phe- 
nomena. In summary, analysis of gene structures of various 
thymidylate synthase genes provides support for the 
introns-late theory, pointing to a recent acquisition of the 
introns in the course of eukaryotic spliceosomal gene evo- 
lution. This conclusion is also supported by an apparent 
lack of correlation between conserved or homologous in- 
tron locations and the positions of the edge amino acid res- 
idues of thymidylate synthase protein secondary structure 
entities. In respect to the latter, intron insertion seems to 
proceed rather as a stochastic event. Analogous conclu- 
sions were inferred also from the intron load pattern in 
other ancient housekeeping genes, including actin [28], 
glyceraldehyde-3-phosphate dehydrogenase [29], triose- 
phosphate isomerase [12,26,30] and tubulin [31]. 

The present paper reports on thymidylate synthase ret- 
rogene identified in T. pseudospimlis muscle larvae. 
Pseudogenes are defective copies of functional genes that 
accumulated in the genomes of many modern organ- 
isms, especially mammals. They tended to be considered 
as molecular relics, accumulating numerous mutations 
due to release from selection pressure [32]. Pseudogenes 
may arise either by duplication of genomic DNA or by 
retroposition, i.e. reverse transcription primed at poly A 
tails of an intron-less transcript, followed by insertion of 
a transposable element into another genomic location. 
Pseudogenes arising by the second mechanism, called 
processed pseudogenes or retropseudogenes, display a 
low survival rate [33,34]. Only 10% of protein coding 
genes of the human genome have been estimated to 
have at least one processed pseudogene. Processed 
pseudogenes of highly expressed housekeeping genes, 
including those coding for ribosomal proteins, keratin, 
glyceraldehyde-3-phosphate dehydrogenase and actin, 
account for the majority of the total number of ~8000 
human processed pseudogenes [35]. Mouse genome 
was estimated to contain -5000 processed pseudogenes 
[36]. Thymidylate synthase processed pseudogene 



incapable of encoding functional enzyme was reported 
for a mouse fibroblast cell line [37]. Among human and 
mouse genes, those having multiple copies of processed 
pseudogenes are predominantly housekeeping genes 
highly expressed in the germ lines or embryonic cells 
[36]. Although the vast majority of pseudogenes remain 
functionally inactive, the evidence is accumulating on 
the abundance of functional processed pseudogenes, 
called retrogenes, especially in mammals and insects 
[33]. In particular, 20% of human genome processed 
pseudogenes are believed to be expressed [34]. 

To our best knowledge, it is not only the first retro- 
gene of a housekeeping gene, but also the first pseudo- 
gene in general, described for genus Trichinella. C. 
elegans genome was estimated to contain -2000 pseudo- 
genes [38]. However, this number may not be meaning- 
ful for Trichinella whose thymidylate synthase protein 
sequence [16], as well as gene structure (vide supra), are 
apparently divergent from those of C. elegans. Understand- 
ing the mode of thymidylate synthase retrogene insertion 
into T. pseudospimlis genome and its transcriptional regu- 
lation require further investigation. However, as the muscle 
larvae of this species move freely within the muscle tissue, 
in contrast to T. spiralis muscle larvae being confined to a 
capsule, a possibility appears that the retrogene expression 
accounts for the life style adaptation (cf. [39]). In the ab- 
sence of the capsule, overall transcription regulation may 
require hypertranscription of thymidylate synthase gene, in 
order to maintain high enzyme activity, and consequently 
high levels of its product. This type of regulation is known 
to take place for retrogenes of broadly expressed house- 
keeping genes, located to the autosomal chromosomes 
during and after male meiosis [33]. This reasoning corre- 
sponds with a notion that the energy demands of transcrip- 
tion and splicing may favor selection for compactness in 
the case of highly expressed and/or rapidly regulated genes 
[40,41], and thymidylate synthase was identified as highly 
expressed in Trichinella muscle larvae. 

The mechanisms of both phenomena discussed in this 
paper in regard to Trichinella thymidylate synthase gen- 
omic sequences, i.e. relatively high intron load and retro- 
position, are underlain by reverse transcriptase activity. 
Thus, high levels of reverse transcription seems to shape 
genome evolution of this lineage. Analysis of other proc- 
essed pseudogenes and their flanking sequences is re- 
quired, in order to identify a putative general mode of 
retroposition specific for this genus. In mammals, in- 
volvement of the LI LINE retrotransposon has been 
established [42,43]. 

Conclusions 

Intron load and distribution within thymidylate synthase 
genes of various species display kingdom-specific pat- 
terns with no conserved or homologous introns shared 
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among fungi, animals and plant genes. Locations of con- 
served and homologous introns within animal genes do 
not show correlation with the enzyme protein structural 
motifs borders. This allows us to conclude that intron 
insertion into thymidylate synthase genes depicts evolu- 
tionary late gain events, being rather stochastic in regard 
to the enzyme structure. Identification of thymidylate 
synthase retrogene in T. pseudospiralis muscle larvae 
points to a possibility that compactness of genomic se- 
quence coding for this enzyme may reflect larval adapta- 
tion to existence within the muscle tissue in a non- 
encapsulated form. 
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Nucleotide positions differing between the genes of two species are 
indicated by asterisks. Alignment was performed using Clustal X software 
and Genomatix Matlnspector software served for consensus TATA box 
identification. Figure S2. Alignment of T. spiralis and T. pseudospiralis 
thymidylate synthase amino acid sequences. Amino acid substitutions are 
marked with asterisks. Enzyme conserved folate and doexyuridylate (active 
center) binding sites are marked bold. Figure S3. Alignment of amino acid 
sequences of various thymidylate synthases performed in Clustal X software. 
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